Parallel Probabilistic Inference on Cache-coherent Multiprocessors

نویسندگان

  • Alexander V. Kozlov
  • Jaswinder Pal Singh
چکیده

Probabilistic inference is a technique used in expert systems for reasoning under uncertainty. A typical inference task is to determine the probability of some events (say diseases) given evidence about other events (say ndings). Inference is conveniently represented as the propagation of evidence in a graph called a belief network. Probabilistic inference is computationally very expensive (NP-hard in general), but is very important both in itself and to calibrate less expensive approximate schemes. It is therefore natural to explore speeding up inference exploiting parallelism. Probabilistic inference aaords concurrency at two diierent levels and presents interesting tradeoos between load balance and data locality. We present two parallel implementations of probabilistic inference in belief networks. One uses a static assignment of work to processors but sacriices some available concurrency, while the other uses dynamic assignment to obtain both forms of concurrency but sacriices some types of data locality. We provide detailed performance measurements and analysis for both implementations on two cache-coherent shared-address-space multiprocessors: a 32-processor Stanford DASH multiprocessor with physically distributed main memory, and a 16-processor bus based SGI Challenge XL with centralized main memory. We nd that the static assignment scheme produces better results uniformly over all input networks on both machines since it has better locality properties. To understand scalability with problem size, number of processors and cache organization, we perform software simulations of the multiprocessor execution and provide a detailed characterization of the spatial and temporal data locality. We nd that temporal locality is low due to both large size of the working sets and low reuse of data, while spatial locality is high. Our results suggest that probabilistic inference is both successful on moderate-scale cache-coherent machines, as well as a good benchmark for testing the memory and communication architectures of these machines due to its emphasis on the data locality.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic Parallelization for Non-cache Coherent Multiprocessors

Although much work has been done on parallelizing compilers for cache coherent shared memory multiprocessors and message-passing multiprocessors, there is relatively little research on parallelizing compilers for noncache coherent multiprocessors with global address space. In this paper, we present a preliminary study on automatic parallelization for the Cray T3D, a commercial scalable machine ...

متن کامل

Compiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors

This document describes a set of new techniques for improving the eeciency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared memory) multiprocessors. The key component used in this scheme is a data ow framework that exploits information about array access patterns and about the cache coherence protocol to predict at compile...

متن کامل

Software Caching on Cache-Coherent Multiprocessors

Programmers have always been concerned with data distribution and remote memory access costs on shared-memory multiprocessors that lack coherent caches, like the BBN Butterry. Recently memory latency has become an important issue on cache-coherent multiprocessors, where dramatic improvements in microprocessor performance have increased the relative cost of cache misses and coherency transaction...

متن کامل

Parallel Hierarchical Radiosity On Cache-Coherent Multiprocessors

Computing radiosity is a computationally very expensive problem in computer graphics. Recent hierarchical methods have greatly speeded up the computation of first diffuse and now also specular radiosity. We present a parallel algorithm for computing both diffuse and specular radiosity together, and examine its performance in detail on cache-coherent shared address space multiprocessors. We comp...

متن کامل

A Preliminary Evaluation of Cache-miss-initiated Prefetching Techniques in Scalable Multiprocessors

Prefetching is an important technique for reducing the average latency of memory accesses in scalable cache-coherent multiprocessors. Aggressive prefetching can signiicantly reduce the number of cache misses, but may introduce bursty network and memory traac, and increase data sharing and cache pollution. Given that we anticipate enormous increases in both network bandwidth and latency, we exam...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996